Introduction

Objective: The primary objective of this project is to delve deep into the intricacies of unemployment data, focusing on the disparities and patterns across different races and education levels. By leveraging data analysis and visualization techniques, we aim to shed light on the socio-economic implications of unemployment and how they manifest differently across various demographic groups.

Dataset Overview: The dataset provides a comprehensive view of unemployment rates spanning several years. It breaks down the data based on race, gender, and education levels, offering a granular perspective on the unemployment landscape.

Significance: Understanding unemployment trends is crucial for policymakers, researchers, and social activists. It not only reflects the economic health of a nation but also highlights areas where interventions might be necessary to ensure equal opportunities for all.


Current Status

Data Loading and Cleaning: The dataset has been successfully loaded into the analysis environment. Preliminary data cleaning steps have been executed to ensure the data's integrity and consistency.

Exploratory Data Analysis (EDA): Initial EDA has been conducted, providing insights into the overall structure of the data. Visualizations have been generated to compare unemployment rates across races for different education levels.

Challenges: We encountered challenges in plotting specific subsets of the data. These challenges are currently being addressed to ensure accurate and meaningful visualizations.

Next Steps:

  • Address the plotting challenges to generate comprehensive visualizations.
  • Conduct a deeper statistical analysis to identify significant patterns and trends.
  • Explore potential correlations between unemployment rates and external factors (if available in the dataset or from supplementary sources).
  • Summarize findings and provide recommendations based on the analysis.

You can copy and paste the above content into a markdown cell in your notebook to provide a more detailed overview and current status of your project.

import pandas as pd

# Load the CSV dataset
data = pd.read_csv('unemployment_data_us.csv')

# Display the first few rows of the dataset
data.head()
Year Month Primary_School Date High_School Associates_Degree Professional_Degree White Black Asian Hispanic Men Women
0 2010 Jan 15.3 Jan-2010 10.2 8.6 4.9 8.8 16.5 8.3 12.9 10.2 7.9
1 2011 Jan 14.3 Jan-2011 9.5 8.1 4.3 8.1 15.8 6.8 12.3 9.0 7.9
2 2012 Jan 13.0 Jan-2012 8.5 7.1 4.3 7.4 13.6 6.7 10.7 7.7 7.6
3 2013 Jan 12.0 Jan-2013 8.1 6.9 3.8 7.1 13.7 6.4 9.7 7.5 7.2
4 2014 Jan 9.4 Jan-2014 6.5 5.9 3.3 5.7 12.1 4.7 8.3 6.2 5.8
data.head(30)
import matplotlib.pyplot as plt

# Plotting unemployment rates for different races over the years
plt.figure(figsize=(14, 7))
for race in ['White', 'Black']:
    plt.plot(data['Year'], data[race], label=race)

plt.title('Unemployment Rates for Different Races Over the Years')
plt.xlabel('Year')
plt.ylabel('Unemployment Rate (%)')
plt.legend()
plt.grid(True)
plt.show()

Unemployment Rates for Different Races Over the Years

From the previous plot, we observed that the unemployment rate for both the White and Black races tends to move in tandem, indicating that broader economic factors likely influence both groups similarly. However, the Black community consistently has a higher unemployment rate compared to the White community throughout the years.

Next, we'll investigate the unemployment rates for all combinations of races in the dataset to get a comprehensive view of the disparities between them.

# Extracting all race columns from the dataset
race_columns = ['White', 'Black', 'Asian', 'Hispanic']

# Plotting unemployment rates for all combinations of races over the years
plt.figure(figsize=(16, 8))
for race in race_columns:
    plt.plot(data['Year'], data[race], label=race, linewidth=2)

plt.title('Unemployment Rates for All Races Over the Years')
plt.xlabel('Year')
plt.ylabel('Unemployment Rate (%)')
plt.legend()
plt.grid(True)
plt.show()

Unemployment Rates Based on Education Levels for Different Races

Having observed the unemployment trends for different races, it's essential to understand how education plays a role in these rates. Education is often seen as a pathway to better job opportunities and financial stability. By analyzing unemployment rates based on education levels for each race, we can gain insights into the disparities and challenges faced by different racial groups in the job market.

In the next analysis, we'll visualize the unemployment rates for different education levels (Primary School, High School, Associates Degree, Professional Degree) for each race over the years.

# Plotting unemployment rates based on education levels for each race
education_columns = ['Primary_School', 'High_School', 'Associates_Degree', 'Professional_Degree']

plt.figure(figsize=(18, 12))
for idx, race in enumerate(race_columns, 1):
    plt.subplot(2, 2, idx)
    for edu in education_columns:
        plt.plot(data['Year'], data[edu], label=edu, linewidth=2)
    plt.title(f'Unemployment Rates by Education for {race} Race')
    plt.xlabel('Year')
    plt.ylabel('Unemployment Rate (%)')
    plt.legend()
    plt.grid(True)

plt.tight_layout()
plt.show()

Observations and Insights

  1. Equality in Primary Education Unemployment: As observed, all races seem to have similar unemployment rates when their highest level of education is primary school. This suggests that individuals with only primary education face similar challenges in the job market, regardless of their racial background.

  2. Disparities in Higher Education: As the level of education increases, disparities in unemployment rates between races become more evident. For instance, individuals with a professional degree from the Black community face higher unemployment rates compared to their counterparts from other races.

  3. Education as a Buffer: Across all races, higher education levels correlate with lower unemployment rates. This emphasizes the importance of education as a buffer against unemployment.

Additional Analysis Ideas

  • Yearly Trends: Analyze the yearly trends in unemployment for each race and education level to identify any specific years with significant changes.

  • Correlation Analysis: Investigate the correlation between different education levels and unemployment rates for each race. This can provide insights into how strongly education impacts unemployment for each racial group.

  • Comparative Analysis: Compare the unemployment rates of one racial group with the average unemployment rate of all groups combined. This can highlight specific racial groups that are above or below the average.

  • Economic Factors: Integrate external economic factors (e.g., GDP growth, recession periods) to understand their impact on unemployment rates across races and education levels.

To understand the evolution of unemployment rates over the years for each racial group, we'll visualize the yearly trends based on different education levels. This will provide insights into how specific years or periods might have impacted unemployment rates differently for each racial and educational group.

# Plotting yearly trends in unemployment based on education levels for each race
plt.figure(figsize=(20, 14))
for idx, race in enumerate(race_columns, 1):
    plt.subplot(2, 2, idx)
    for edu in education_columns:
        plt.plot(data['Year'], data[edu], label=edu, linewidth=2)
    plt.title(f'Yearly Unemployment Trends by Education for {race} Race')
    plt.xlabel('Year')
    plt.ylabel('Unemployment Rate (%)')
    plt.legend()
    plt.grid(True)

plt.tight_layout()
plt.show()
  1. Consistent Patterns Across Races: For all racial groups, we observe a consistent pattern where individuals with only primary education face the highest unemployment rates, followed by those with high school education, associate's degree, and professional degree. This trend emphasizes the importance of higher education in securing employment.

  2. Economic Downturns: There are noticeable spikes in unemployment rates around specific years, likely corresponding to economic downturns or recessions. These spikes are evident across all education levels and races, indicating the widespread impact of such economic events.

  3. Recovery Patterns: Post-recession or economic downturn periods show a decline in unemployment rates, indicating economic recovery. However, the speed and extent of recovery vary based on education levels. Those with higher education tend to recover faster and more significantly than those with lower education levels.

Next, we'll proceed with the correlation analysis to understand the strength of the relationship between education and unemployment for each racial group.

Correlation Analysis

To understand the relationship between education and unemployment for each racial group, we'll compute the correlation coefficients for unemployment rates across different education levels. A correlation coefficient close to 1 indicates a strong positive relationship, while a coefficient close to -1 indicates a strong negative relationship. A coefficient close to 0 suggests a weak or no relationship.

We'll visualize these correlations using a heatmap for a clearer representation.

import seaborn as sns

# Computing the correlation matrix for education levels
correlation_matrix = data[education_columns].corr()

# Visualizing the correlation matrix using a heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', vmin=-1, vmax=1, center=0)
plt.title('Correlation Between Unemployment Rates Across Education Levels')
plt.show()

Observations from the Correlation Analysis

  1. Strong Positive Correlations: The heatmap indicates strong positive correlations between unemployment rates across different education levels. This suggests that when unemployment rates increase for one education level, they tend to increase for other education levels as well.

  2. Primary School and High School: There's a very strong positive correlation between unemployment rates for individuals with primary school education and those with high school education. This indicates that these two groups often experience similar trends in the job market.

  3. Associates Degree and Professional Degree: Similarly, there's a strong positive correlation between unemployment rates for individuals with an associate's degree and those with a professional degree. This suggests that higher education levels often experience parallel trends in unemployment.

Next, we'll proceed with a comparative analysis to compare the unemployment rates of one racial group with the average unemployment rate of all groups combined.

Comparative Analysis

In this section, we'll compare the unemployment rates of each racial group with the average unemployment rate of all groups combined. This will provide a clearer perspective on which racial groups have unemployment rates above or below the average.

We'll visualize this comparison using line plots, where each line represents a racial group's unemployment rate over the years, and an additional line will represent the average unemployment rate of all groups.

list(data.columns)
['Year',
 'Month',
 'Primary_School',
 'Date',
 'High_School',
 'Associates_Degree',
 'Professional_Degree',
 'White',
 'Black',
 'Asian',
 'Hispanic',
 'Men',
 'Women']
# Creating separate dataframes for each race
race_dfs = {}

for race in race_columns:
    # Extracting columns related to the current race
    columns_for_race = [col for col in data.columns if race in col]

    # Creating a dataframe for the current race
    race_dfs[race] = data[['Year'] + columns_for_race].groupby('Year').mean().reset_index()

# Displaying the dataframe for 'White' as an example
race_dfs['White'].head()
Year White
0 2010 8.716667
1 2011 7.933333
2 2012 7.191667
3 2013 6.508333
4 2014 5.300000
# Creating separate dataframes for each race excluding columns of other races
race_dfs_exclusive = {}

for race in race_columns:
    # Extracting columns not related to other races
    columns_exclusive = [col for col in data.columns if not any(other_race in col for other_race in race_columns if other_race != race)]

    # Creating a dataframe for the current race
    race_dfs_exclusive[race] = data[['Year'] + columns_exclusive]

# Displaying the dataframe for 'White' as an example
race_dfs_exclusive['White'].head()
Year Year Month Primary_School Date High_School Associates_Degree Professional_Degree White Men Women
0 2010 2010 Jan 15.3 Jan-2010 10.2 8.6 4.9 8.8 10.2 7.9
1 2011 2011 Jan 14.3 Jan-2011 9.5 8.1 4.3 8.1 9.0 7.9
2 2012 2012 Jan 13.0 Jan-2012 8.5 7.1 4.3 7.4 7.7 7.6
3 2013 2013 Jan 12.0 Jan-2013 8.1 6.9 3.8 7.1 7.5 7.2
4 2014 2014 Jan 9.4 Jan-2014 6.5 5.9 3.3 5.7 6.2 5.8
race_dfs_exclusive['Black'].head(400)
2023-08-08T05:26:52.524918 [warning  ] duplicate columns found: {'Year': 1} [dx.utils.formatting] filename=formatting.py func_name=check_for_duplicate_columns lineno=86
import matplotlib.pyplot as plt
import numpy as np
from mpl_toolkits.mplot3d import Axes3D

education_levels=[]

# Define a function to plot 3D bar chart for each race's men unemployment rate
def plot_3d_bar_for_race(df, race):
    fig = plt.figure(figsize=(10, 6))
    ax = fig.add_subplot(111, projection='3d')

    # Define the positions for y-axis (education levels)
    education_positions = np.arange(len(education_levels))

    # For each year and education level, plot the bar
    for i, edu in enumerate(education_levels):
        men_col = f'{race}_Men_{edu}'
        if men_col in df.columns:
            ax.bar(df['Year'], df[men_col], zdir='y', width=0.5, bottom=i, zs=i)

    ax.set_xlabel('Year')
    ax.set_ylabel('Education Level')
    ax.set_zlabel('Men Unemployment Rate')
    ax.set_yticks(education_positions)
    ax.set_yticklabels(education_levels)
    ax.set_title(f'Men Unemployment Rate for {race} Race')
    plt.tight_layout()
    plt.show()

# Plotting for 'White' race as an example
plot_3d_bar_for_race(race_dfs_exclusive['White'], 'White')
# Define the columns to extract
columns_to_extract = [\
    "Year",\
    "Primary_School",\
    "High_School",\
    "Associates_Degree",\
    "Professional_Degree"]

# Extract the columns from the original dataframe
black_subdf = race_dfs_exclusive['Black'][columns_to_extract].copy()
white_subdf = race_dfs_exclusive['White'][columns_to_extract].copy()

asian_subdf = race_dfs_exclusive['Asian'][columns_to_extract].copy()
hispanic_subdf = race_dfs_exclusive['Hispanic'][columns_to_extract].copy()




black_subdf.head()
hispanic_subdf.head()
asian_subdf.head()
white_subdf.head



fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

# check the lengths of the dimensions we want to use :

subset_black_subdf = black_subdf.head(132)



#ax.scatter(subset_black_subdf['Year'], subset_black_subdf['High_School'], subset_black_subdf['Professional_Degree'])

#print(len(black_subdf['Year']))
#Plot using the columns 'X', 'Y', and 'Z' from the DataFrame
#ax.scatter(black_subdf['Year'].iloc[:132], black_subdf['High_School'].iloc[:132], black_subdf['Professional_Degree'].iloc[:132])


# Setting labels
#ax.set_xlabel('X Label')
#ax.set_ylabel('Y Label')
#ax.set_zlabel('Z Label')

#plt.show()
subset_black_subdf.shape
(132, 6)
fig = plt.figure()


ax.scatter(subset_black_subdf['Year'], subset_black_subdf['High_School'], subset_black_subdf['Professional_Degree'])
# Setting labels
#ax.set_xlabel('X Label')
#ax.set_ylabel('Y Label')
#ax.set_zlabel('Z Label')


print(subset_black_subdf['Year'])
plt.show()
<Figure size 640x480 with 0 Axes>
print(subset_black_subdf['Year'])
     Year  Year
0    2010  2010
1    2011  2011
2    2012  2012
3    2013  2013
4    2014  2014
..    ...   ...
127  2016  2016
128  2017  2017
129  2018  2018
130  2019  2019
131  2020  2020

[132 rows x 2 columns]
print(subset_black_subdf['High_School'])
0      10.2
1       9.5
2       8.5
3       8.1
4       6.5
       ... 
127     5.1
128     4.2
129     3.8
130     3.7
131     NaN
Name: High_School, Length: 132, dtype: float64
print(subset_black_subdf['Professional_Degree'])
0      4.9
1      4.3
2      4.3
3      3.8
4      3.3
      ... 
127    2.5
128    2.2
129    2.2
130    1.9
131    NaN
Name: Professional_Degree, Length: 132, dtype: float64